The k-Anonymity Problem Is Hard
نویسندگان
چکیده
The problem of publishing personal data without giving up privacy is becoming increasingly important. An interesting formalization recently proposed is the k-anonymity. This approach requires that the rows in a table are clustered in sets of size at least k and that all the rows in a cluster become the same tuple, after the suppression of some records. The natural optimization problem, where the goal is to minimize the number of suppressed entries, is known to be NP-hard when the values are over a ternary alphabet, k = 3 and the rows length is unbounded. In this paper we give a lower bound on the approximation factor that any polynomial-time algorithm can achive on two restrictions of the problem, namely (i) when the records values are over a binary alphabet and k = 3, and (ii) when the records have length at most 8 and k = 4, showing that these restrictions of the problem are APX-hard.
منابع مشابه
Improved Univariate Microaggregation for Integer Values
Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational priva...
متن کاملResolving the Complexity of Some Data Privacy Problems
We formally study two methods for data sanitation that have been used extensively in the database community: k-anonymity and l-diversity. We settle several open problems concerning the difficulty of applying these methods optimally, proving both positive and negative results: – 2-anonymity is in P. – The problem of partitioning the edges of a triangle-free graph into 4-stars (degree-three verti...
متن کاملParameterized Complexity of k-Anonymity: Hardness and Tractability
The problem of publishing personal data without giving up privacy is becoming increasingly important. A clean formalization that has been recently proposed is the k-anonymity, where the rows of a table are partitioned in clusters of size at least k and all rows in a cluster become the same tuple, after the suppression of some entries. The natural optimization problem, where the goal is to minim...
متن کاملPattern-Guided k-Anonymity
We suggest a user-oriented approach to combinatorial data anonymization. A data matrix is called k-anonymous if every row appears at least k times—the goal of the NP-hard k-ANONYMITY problem then is to make a given matrix k-anonymous by suppressing (blanking out) as few entries as possible. Building on previous work and coping with corresponding deficiencies, we describe an enhanced k-anonymiza...
متن کاملThe Effect of Homogeneity on the Complexity of k-Anonymity
The NP-hard k-Anonymity problem asks, given an n×mmatrix M over a fixed alphabet and an integer s > 0, whether M can be made k-anonymous by suppressing (blanking out) at most s entries. A matrix M is said to be k-anonymous if for each row r in M there are at least k − 1 other rows in M which are identical to r. Complementing previous work, we introduce two new “data-driven” parameterizations fo...
متن کاملChecking for k-Anonymity Violation by Views
When a private relational table is published using views, secrecy or privacy may be violated. This paper uses a formally-defined notion of k-anonymity to measure disclosure by views, where k>1 is a positive integer. Intuitively, violation of k-anonymity occurs when a particular attribute value of an entity can be determined to be among less than k possibilities by using the views together with ...
متن کامل